Support compression= in DataFrame.to_json #17634

mroeschke · 2024-12-19T19:32:26Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Matt711 · 2024-12-19T19:58:24Z

python/cudf/cudf/io/json.py

@@ -54,6 +54,22 @@ def _get_cudf_schema_element_from_dtype(
    return lib_type, child_types


+def _to_plc_compression(


Do we use something like this in other places? If so, we could reuse it.

It looks like we have something similar we use in parquet, but given that each format supports different compression (and maps slightly differently from Python), I can take a look at this in a follow up

Matt711 · 2024-12-19T19:58:28Z

python/cudf/cudf/tests/test_json.py

@@ -1453,3 +1453,12 @@ def test_chunked_json_reader():
    with cudf.option_context("io.json.low_memory", True):
        gdf = cudf.read_json(buf, lines=True)
    assert_eq(df, gdf)
+
+
+@pytest.mark.parametrize("compression", ["gzip", None])


There's a compression_params you could use

Suggested change

@pytest.mark.parametrize("compression", ["gzip", None])

@pytest.mark.parametrize("compression", compression_params)

Yeah I tried this originally, but it appears gzip is the only supported compression for writing json

mroeschke · 2024-12-19T21:51:59Z

/merge

Support compression= in DataFrame.to_json

b2a5f6b

mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 19, 2024

mroeschke self-assigned this Dec 19, 2024

mroeschke requested a review from a team as a code owner December 19, 2024 19:32

mroeschke requested review from wence- and galipremsagar December 19, 2024 19:32

github-actions bot added the pylibcudf Issues specific to the pylibcudf package label Dec 19, 2024

Matt711 approved these changes Dec 19, 2024

View reviewed changes

rapids-bot bot merged commit 550ea35 into rapidsai:branch-25.02 Dec 19, 2024
109 checks passed

mroeschke deleted the enh/to_json/compression branch December 19, 2024 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support compression= in DataFrame.to_json #17634

Support compression= in DataFrame.to_json #17634

mroeschke commented Dec 19, 2024

Matt711 Dec 19, 2024

mroeschke Dec 19, 2024

Matt711 Dec 19, 2024

mroeschke Dec 19, 2024

mroeschke commented Dec 19, 2024

		@@ -54,6 +54,22 @@ def _get_cudf_schema_element_from_dtype(
		return lib_type, child_types


		def _to_plc_compression(

	@pytest.mark.parametrize("compression", ["gzip", None])
	@pytest.mark.parametrize("compression", compression_params)

Support compression= in DataFrame.to_json #17634

Support compression= in DataFrame.to_json #17634

Conversation

mroeschke commented Dec 19, 2024

Description

Checklist

Matt711 Dec 19, 2024

Choose a reason for hiding this comment

mroeschke Dec 19, 2024

Choose a reason for hiding this comment

Matt711 Dec 19, 2024

Choose a reason for hiding this comment

mroeschke Dec 19, 2024

Choose a reason for hiding this comment

mroeschke commented Dec 19, 2024